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Abstract 

This paper suggests a learning-theoretic perspective on how synaptic plasticity 
benefits global brain functioning. We introduce a model, the selection, that (i) 
arises as the fast time constant limit of leaky integrate-and-fire neurons equipped 
with spiking timing dependent plasticity (STDP) and (ii) is amenable to theoretical 
analysis. We show that the selectron encodes reward estimates into spikes and that 
an error bound on spikes is controlled by a spiking margin and the sum of synaptic 
weights. Moreover, the efficacy of spikes (their usefulness to other reward maxi- 
mizing selectrons) also depends on total synaptic strength. Finally, based on our 
analysis, we propose a regularized version of STDP, and show the regularization 
improves the robustness of neuronal learning when faced with multiple stimuli. 

1 Introduction 

Finding principles underlying learning in neural networks is an important problem for both artificial 
and biological networks. An elegant suggestion is that global objective functions may be optimized 
during learning [1]. For biological networks however, the currently known neural plasticity mech- 
anisms use a very restricted set of data - largely consisting of spikes and diffuse neuromodulatory 
signals. How a global optimization procedure could be implemented at the neuronal (cellular) level 
is thus a difficult problem. 

A successful approach to this question has been Rosenblatt's perceptron [2| and its extension to 
multilayer perceptrons via backpropagation [3]. Similarly, (restricted) Boltzmann machines, con- 
structed from simple stochastic units, have provided a remarkably powerful approach to organizing 
distributed optimization across many layers [4|. By contrast, although there has been significant 
progress in developing and understanding more biologically realistic models of neuronal learn- 
ing pHlO), these do not match the performance of simpler, more analytically and computationally 
tractable models in learning tasks. 

Overview. This paper constructs a bridge from biologically realistic to analytically tractable mod- 
els. The selectron is a model derived from leaky integrate and fire neurons equipped with spike- 
timing dependent plasticity that is amenable to learning-theoretic analysis. Our aim is to extract 
some of the principles implicit in STDP by thoroughly investigating a limit case. 

Section f|2] introduces the selectron. We state a constrained reward maximization problem which 
implies that selectrons encode empirical reward estimates into spikes. Our first result, section Sj3] 
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is that the selectron arises as the fast time constant limit of well-established models of neuronal 
spiking and plasticity, suggesting that cortical neurons may also be encoding reward estimates into 
their spiketrains. 

Two important questions immediately arise. First, what guarantees can be provided on spikes being 
reliable predictors of global (neuromodulatory) outcomes? Second, what guarantees can be provided 
on the usefulness of spikes to other neurons? Sections f|4]and f|5]answer these questions by providing 
an upper bound on a suitably defined 0/1 loss and a lower bound on the efficacy of a selection's 
spikes, measured in terms of its contribution to the expected reward of a downstream selectron. 
Both bounds are controlled by the sum of synaptic weights ||w||i, thereby justifying the constraint 
introduced in fj2] Finally, motivated by our analysis, £|6| introduces a regularized STDP rule and 
shows that it learns more robustly than classical STDP. q7]concludes the paper. Proofs of theorems 
are provided in the supplementary material. 

Related work. Spike-timing dependent plasticity and its implications for the neural code have 
been intensively studied in recent years. The work closest in spirit to our own is Seung's "hedonistic" 
synapses, which seek to increase average reward [6]. Our work provides guarantees on the finite 
sample behavior of a discrete-time analog of hedonistic neurons. Another related line of research 
derives from the information bottleneck method ||9p"T| which provides an alternate constraint to the 
one considered here. An information-theoretic perspective on synaptic homeostasis and metabolic 
cost, complementing the results in this paper, can be found in |12][T3). Simulations combining 
synaptic renormalization with burst-STDP can be found in fl4) . 

Important aspects of plasticity that we have not considered here are properties specific to continuous- 
time models, such as STDP's behavior as a temporal filter ]T5) , and also issues related to conver- 
gence |8p0|. 

The learning-theoretic properties of neural networks have been intensively studied, mostly focusing 
on perceptrons, see for example fl6) . A non-biologically motivated "large-margin" analog of the 
perceptron was proposed in fT7) . 

2 The selectron 

We introduce the selectron, which can be considered a biologically motivated adaptation of the 
perceptron, see S]3] The mechanism governing whether or not the selectron spikes is a Heaviside 
function acting on a weighted sum of synaptic inputs; our contribution is to propose a new reward 
function and corresponding learning rule. 

Let us establish some notation. Let X denote the set of A-dimensional {0, l}-valued vectors form- 
ing synaptic inputs to a selectron, and Y = {0, 1} the set of outputs. A selectron spikes according 
to 

y = / w (x) := H (w^x - 0) , where H(z) := j J ^ > ° (1) 

is the Heaviside function and w is a [0, 1] cl valued A-vector specifying the selection's synaptic 
weights. Let P(x) denote the probability of input x arising. 

To model the neuromodulatory system we introduce random variable v : X — > {— 1,0, +1}, where 
positive values correspond to desirable outcomes, negative to undesirable and zero to neutral. Let 
P(z/|x) denote the probability of the release of neuromodulatory signal subsequent to input x. 

Definition 1. Define reward function 

R(*U,y)= Kx) •(^^•t^ = {o (X) ' (WTX ^ ) ST" (2) 

neuromodulators margin selectivity 

The reward consists in three components. The first term is the neuromodulatory signal, which acts as 
a supervisor. The second term is the total current w T x minus the threshold It is analogous to the 
margin in support vector machines or boosting algorithms, see section f|4]for a precise formulation. 
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The third term gates rewards according to whether or not the selectron spikes. The reward is thus 
se/ected[[| neuromodulatory signals are ignored by the selectron's reward function when it does not 
spike, enabling specialization. 

Constrained reward maximization. The selectron solves the following optimization problem: 

n 

maximize: R n := V z/(x (i) ) • (w T x (i) - #) • / w (x w ) (3) 

i=l 

subject to: ||w|| 1 < w for some u > 0. 
Remark 1 (spikes encode rewards). 

Optimization problem ^ ensures that selectrons spike for inputs that, on the basis of their empirical 
sample, reliably lead to neuromodulatory rewards. Thus, spikes encode expectations about rewards. 

The constraint is motivated by the discussion after Theorem [T] and the analysis in Sj4]and Sj5] We 
postpone discussion of how to impose the constraint to Sj6] and focus on reward maximization here. 

The reward maximization problem cannot be solved analytically in general. However, it is possible 
to use an iterative approach. Although / w (x) is not continuous, the reward function is a continuous 
function of w and is differentiable everywhere except for the "corner" where w T x — $ = 0. We 
therefore apply gradient ascent by computing the derivative of ([3]) with respect to synaptic weights 
to obtain online learning rule 

, . , , . (a-vOx) if x, = 1 and y = 1 

Aw, = a ■ u(x) • x, • / w (x) = | Q 1 > dse J (4) 

where update factor a controls the learning rate. 

The learning rule is selective: regardless of the neuromodulatory signal, synapse wjk is updated 
only if there is both an input Xj = 1 and output spike y = / w (x) = 1. 

The selectron is not guaranteed to find a global optimum. It is prone to initial condition dependent 
local optima because rewards depend on output spikes in learning rule Although this is an 
undesirable property for an isolated learner, it is less important, and perhaps even advantageous, in 
large populations where it encourages specialization. 

Remark 2 (unsupervised setting). 

Define the unsupervised setting by f(x) = I for all x. The reward function reduces to i?(x, / w ) = 
(w T x — 1?) • / w (x). Without the constraint synapses will saturate. Imposing the constraint yields a 
more interesting solution where the selectron finds a weight vector summing to uj which balances (i) 
frequent spikes and ( ii) high margins. 

Theorem 1 (Controlling the frequency of spikes). 

Assuming synaptic inputs are i.i.d. Bernoulli variables with P(spike) = p, then 

p(/. w -i)<,.(Miy<,.(s- 



The Bernoulli regime is the discrete-time analog of the homogeneous Poisson setting used to prove 
convergence of reward-modulated STDP in (8). Interestingly, in this setting the constraint provides 
a lever for controlling (lower bounding) rewards per spike 

f , 1 R R 
\ reward per spike \ = > c x ■ — . 

I J P(/w(x) = 1) lj 2 

If inputs are not Bernoulli i.i.d., then P(y = 1) and u still covary, although the precise relationship is 
more difficult to quantify. Although i.i.d. inputs are unrealistic, note that recent neurophysiological 
evidence suggests neuronal firing - even of nearby neurons - is uncorrected ]T8) . 

1 The name "selectron" was chosen to emphasize this selective aspect. 
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3 Relation to leaky integrate-and-fire neurons equipped with STDP 



The literature contains an enormous variety of neuronal models, which vary dramatically in so- 
phistication and the extent to which they incorporate the the details of the underlying biochemical 
processes. Similarly, there is a large menagerie of models of synaptic plasticity 1 19 1. We consider 
two well-established models: Gerstner's Spike Response Model (SRM) which generalizes leaky 
integrate-and-fire neurons [20] and the original spike-timing dependent plasticity learning rule pro- 
posed by Song et al (5], and show that the selectron arises in the fast time constant limit of the two 
models. 

First let us recall the SRM. Suppose neuron n k last outputted a spike at time t k and receives input 
spikes at times tj from neuron n 3 . Neuron n k spikes or according to the Heaviside function applied 
to the membrane potential M w : 

/ w (f) = H (M w (t) - z9) where M w (i) = T](t - t k ) + V] w jk ■ e(t - tj) at time t > t k . 
Input and output spikes add 

e(t-tj) =K t 1 ' - - 1 "•• ' and ,/(/ /, ) = K s < * 



K 2 e 



to the membrane potential for tj < t and t k < t respectively. Here r m and t s are the membrane and 
synapse time constants. 

The original STDP update rule (5) is 



if tj < tk 
else 



(5) 



where r + and r_ are time constants. STDP potentiates input synapses that spike prior to output 
spikes and depotentiates input synapses that spike subsequent to output spikes. 

Theorem 2 (the selectron is the fast time constant limit of SRM + STDP). 

In the fast time constant limit, lim r> — » 0, the SRM transforms into a selectron with 



/w(<) = ff(w w (t) - i?) where M v 



{j'l* 3 >M 



Moreover, STDP transforms into learning rule Q in the unsupervised setting with v{~x) — I for all 
x. Finally, STDP arises as gradient ascent on a reward function whose limit is the unsupervised 
setting of reward function 

Theorem [2] shows that STDP implicitly maximizes a time-discounted analog of the reward function 
in ([3]). We expect many models of reward-modulated synaptic plasticity to be analytically tractable 
in the fast time constant limit. An important property shared by STDP and the selectron is that 
synaptic (de)potentiation is gated by output spikes, see { A. 1 for a comparison with the perception 
which does not gate synaptic learning 



4 An error bound 

Maximizing reward function Q implies that selectrons encode reward estimates into their spikes. 
Indeed, it recursively justifies incorporating spikes into the reward function via the margin (w T x — 
■d), which only makes sense if upstream spikes predict reward. However, in a large system where 
estimates pile on top of each other there is a tendency to overfit, leading to poor generalizations pT) . 
It is therefore crucial to provide guarantees on the quality of spikes as estimators. 

Boosting algorithms, where the outputs of many weak learners are aggregated into a classifier [22) , 
are remarkably resistant to overfitting as the number of learners increases (23) . Cortical learning may 
be analogous to boosting: individual neurons have access to a tiny fraction of the total brain state, 
and so are weak learners; and in the fast time constant limit, neurons are essentially aggregators. 
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We sharpen the analogy using the selectron. As a first step towards understanding how the cortex 
combats overfitting, we adapt a theorem developed to explain the effectiveness of boosting (24). The 
goal is to show how the margin and constraint on synaptic weights improve error bounds. 

Definition 2. A selectron incurs a 0/1 loss if a spike is followed by negative neuromodulator)! 
feedback 

, , „ . fl if y — 1 and z/(x) = — 1 

l(x ) /w,") = l-/ w (i),(i) = ( J S e. (6) 

The 0/1 loss fails to take the estimates (spikes) of other selectrons into account and is difficult to 
optimize, so we also introduce the hinge loss: 

h K (x, U, v) := (k - (wTx - 0) ■ */(x)) + • /w(x), wfere (x) + := ° (7) 

Afofe f/jaf Z < h K for all k > 1. Parameter k controls the saturation point, beyond which the size of 
the margin makes no difference to h K . 

An alternate 0/1 loss^j penalizes a selectron if it (i) fires when it shouldn't, i.e. when v(x) = —1 
or (ii) does not fire when it should, i.e. when i/(x) = 1. However, since the cortex contains 
many neurons and spiking is metabolically expensive (25), we propose a conservative loss that only 
penalizes errors of commission ("first, do no harm") and does not penalize specialization. 

Theorem 3 (spike error bound). 

Suppose each selectron has < N synapses. For any selectron n , let S — {n } U {n J : n? — > n k } 
denote a 2-layer feedforward subnetwork. For all k > 1, with probability at least 1 — 5, 

e [z(x, /w , ,)] <i £ h« (x « /w , ^j) + . . . v^+iy+D+ i 



0/1 /ess ftinge 



capacity term 



2B • \/ w/zere S = k + w - i?. 



confidence term 



Remark 3 (theoretical justification for maximizing margin and constraining ||w||i). 
The theorem shows how subsets of distributed systems can avoid overfitting. First, it demonstrates 
the importance of maximizing the margin (i.e. the empirical reward). Second, it shows the capacity 
term depends on the number of synapses N and the constraint lo on synaptic weights, rather than 
the capacity of S k - which can be very large. 

The hinge loss is difficult to optimize directly since gating with output spikes / w (x) renders it 
discontinuous. However, in the Bernoulli regime, Theorem [T] implies the bound in Theorem [3] can 
be rewritten as 

E[i(x, / w , i/)] < P K ^p — i?n(x' s ',/ w , f(x''')) + uj ■ {capacity term} + {confidence term} (8) 

and so lo again provides the lever required to control the 0/1 loss. The constraint 1 1 w 1 1 1 < lu is best 
imposed offline, see f|6] 



5 A bound on the efficacy of inter-neuronal communication 

Even if a neuron's spikes perfectly predict positive neuromodulatory signals, the spikes only matter 
to the extent they affect other neurons in cortex. Spikes are produced for neurons by neurons. It is 
therefore crucial to provide guarantees on the usefulness of spikes. 

In this section we quantify the effect of one selectron's spikes on another selectron's expected re- 
ward. We demonstrate a lower bound on efficacy and discuss its consequences. 



See S A.5 for an error bound. 
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Definition 3. The efficacy of spikes from selectron rt? on selectron n k is 



SR k 
<5x, 



ELR^Ix,- = 11 -ELR^Ix,- 



01 



1 - 



i.e. the expected contribution of spikes from selectron n? to selectron n k 's expected reward, relative 
to not spiking. The notation is intended to suggest an analogy with differentiation - the infinitesimal 
difference made by spikes on a single synapse. 



0]. In other words, if spikes from n? make no 



Efficacy is zero if E[i? fc |x 3 = 1] = E[i? fc |x.,- 
difference to the expected reward of n k . 

The following theorem relies on the assumption that the average contribution of neuromodu lator s is 
higher after -n? spikes than after it does not spike (i.e. upstream spikes predict reward), see £ A.6 for 
precise statement. When the assumption is false the synapse Wjk should be pruned. 

Theorem 4 (spike efficacy bound). 

Let pj := E[Y J ] denote the frequency of spikes from neuron nP. The efficacy of in? 's spikes on n k is 
lower bounded by 



5R k 

02- y- > 
ox, 



Wj • E[Y^Y k ] 
Pj 



2E 



Y°Y k ■ ((wtyx- 



E 



Y k ■ ((wtyx-i?) 

1 -Pj 



(9) 



efficacy Wj -weighted co-spike frequency 

' and 



where C2 is described in {A.6 



co-spike frequency 

Wi if % ^ j and ifi 



n k spike frequency 



The efficacy guarantee is interpreted as follows. First, the guarantee improves as co-spiking by v? 
and n k increases. However, the denominators imply that increasing the frequency of n J 's spikes 
worsens the guarantee, insofar as n 1 is not correlated with n k . Similarly, from the third term, 
increasing n k, s spikes worsens the guarantee if they do not correlate with nf 

An immediate corollary of Theorem[4]is that Hebbian learning rules, such as STDP and the selectron 
learning rule Q, improve the efficacy of spikes. However, it also shows that naively increasing the 
frequency of spikes carries a cost. Neurons therefore face a tradeoff. In fact, in the Bernoulli regime, 
Theorem [T] implies (|9]l can be rewritten as 



dXj p p(l - p) 



E 



Y 3 Y k ■ ((w\) T x-tf) 



p ■ uj 2 ■ (w — 1?) 

(l-p)l? 2 ' 



(10) 



so the constraint cj on synaptic strength can be used as a lever to improve guarantees on efficacy. 
Remark 4 (efficacy improved by pruning weak synapses). 

The 1 st term in |9| suggests that pruning weak synapses increases the efficacy of spikes, and so may 
aid learning in populations of selectrons or neurons. 



6 Experiments 

Cortical neurons are constantly exposed to different input patterns as organisms engage in different 
activities. It is therefore important that what neurons learn is robust to changing inputs [26,27j. In 
this section, as proof of principle, we investigate a simple tweak of classical STDP involving offline 
regularization. We show that it improves robustness when neurons are exposed to more than one 
pattern. 

Observe that regularizing optimization problem |3]l yields 

n 

maximize: y>(x« / w , i/( x W)) - ?r(\\w\\i - u) 2 (11) 

i=l 

learning rule: Awj = a ■ !/(x) • x 3 ■ / w (x) — 7 • (||w||i — lS) ■ Wj (12) 

incorporates synaptic renormalization directly into the update. However, ( fT2] > requires continuously 
re-evaluating the sum of synaptic weights. We therefore decouple learning into an online reward 
maximization phase and an offline regularization phase which resets the synaptic weights. 
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A similar decoupling may occur in cortex. It has recently been proposed that a function of NREM 
sleep may be to regulate synaptic weights f28) . Indeed, neurophysiological evidence suggests that 
average cortical firing rates increase during wakefulness and decrease during sleep, possibly reflect- 
ing synaptic strengths [29.. 30 1. Experimental evidence also points to a net increase in dendritic 
spines (synapses) during waking and a net decrease during sleep [31 1. 



Setup. We trained a neuron on a random input pattern for 10s to 87% accuracy with regularized 



STDP. See { A. 7 for details on the structure of inputs. We then performed 700 trials (350 classical 
and 350 regularized) exposing the neuron to a new pattern for 20 seconds and observed performance 
under classical and regularized STDP. 

SRM neurons with classical STDP. We used Gerstner's SRM model, recall f]3] with parameters 
chosen to exactly coincide with [32) : r m = 10, r s = 2.5, K = 2.2, K\ — 2, K 2 = 4 and 
■d = -j ^synapses. STDP was implemented via ((5]l with parameters a + = 0.03125, r + = 16.8, 
a_ = 0.85a+ and t_ = 33.7 also taken from [32|. Synaptic weights were clipped to fall in [0, 1]. 

Regularized STDP consists of a small tweak of classical STDP in the online phase, and an addi- 
tional offline regularization phase: 

• Online. In the online phase, reduce the depotentiation bias from 0.85a+ in the classical 
implementation to a_ = 0.75a + . 

• Offline. In the offline phase, modify synapses once per second according to 

_ J 7 • (| - Wj -) • (w - s) ifu><s 



Aw 3 = ' " v , (13) 

[7 • (ui — s) else, 

where s is output spikes per second, u = 5Hz is the target rate and update factor 7 = 0.6. 
The offline update rule is firing rate, and not spike, dependent. 

Classical STDP has a depotentiation bias to prevent runaway potentiation feedback loops leading to 
seizures |5|. Since synapses are frequently renormalized offline we incorporate a weak exploratory 
(potentiation) bias during the online phase which helps avoid local minima^] This is in line with 
experimental evidence showing increased cortical activity during waking [30 1 . 

Since computing the sum of synaptic weights is non-physiological, we draw on Theorem [T] and 
use the neuron's firing rate when responding to uncorrelated inputs as a proxy for ||w||i. Thus, 
in the offline phase, synapses receive inputs generated as in the online phase but without repeated 
patterns. Note that (fT2lhas a larger pruning effect on stronger synapses, discouraging specialization. 
Motivated by Remaricpfl we introduce bias (| — wj) in the offline phase to ensure weaker synapses 
are downscaled more than strong synapses. For example, a synapse with w, = 0.5 is downscaled 
by twice as much as a synapse with weight Wj = 1.0. 

Regularized STDP alternates between 2 seconds online and 4 seconds offline, which suffices to 
renormalize synaptic strengths. The frequency of the offline phase could be reduced by decreas- 
ing the update factors a±, presenting stimuli less frequently (than 7 times per second), or adding 
inhibitory neurons to the system. 



Results. A summary of results is presented in the table below: accuracy quantifies the fraction 
of spikes that co-occur with each pattern. Regularized STDP outperforms classical STDP on both 
patterns on average. It should be noted that regularized neurons were not only online for 20 seconds 
but also offline - and exposed to Poisson noise - for 40 seconds. Interestingly, exposure to Poisson 
noise improves performance. 



Algorithm 


Accuracy 




Pattern 1 Pattern 2 


Classical 


54% 39% 


Regularized 


59% 48% 



3 The input stream contains a repeated pattern, so there is a potentiation bias in practice even though the net 
integral of STDP in the online phase is negative. 
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ACCURACY ON #2 TRIALS ACCURACY ON #2 TRIALS 

(a) Classical STDP (b) Regularized STDP 

Figure 1 : Accuracy after 20 seconds of exposure to a novel pattern. 

Fig. [T] provides a more detailed analysis. Each panel shows a 2D-histogram (darker shades of gray 
correspond to more trials) plotting accuracies on both patterns simultaneously, and two ID his- 
tograms plotting accuracies on the two patterns separately. The ID histogram for regularized STDP 
shows a unimodal distribution for pattern #2, with most of the mass over accuracies of 50-90%. For 
pattern #1, which has been "unlearned" for twice as long as the training period, most of the mass is 
over accuracies of 50% to 90%, with a significant fraction "unlearnt". By contrast, classical STDP 
exhibits extremely brittle behavior. It completely unlearns the original pattern in about half the trials, 
and also fails to leam the new pattern in most of the trials. 

Thus, as suggested by our analysis, introducing a regularization both improves the robustness of 
STDP and enables an exploratory bias by preventing runaway feedback leading to epileptic seizures. 



7 Discussion 



The selectron provides a bridge between a particular model of spiking neurons - the Spike Re- 
sponse Model |20) with the original spike-timing dependent plasticity rule (5) - and models that 
are amenable to learning-theoretic analysis. Our hope is that the selectron and related models lead 
to an improved understanding of the principles underlying learning in cortex. It remains to be seen 
whether other STDP-based models also have tractable discrete-time analogs. 

The selectron is an interesting model in its own right: it embeds reward estimates into spikes and 
maximizes a margin that improves error bounds. It imposes a constraint on synaptic weights that: 
concentrates rewards/spike, tightens error bounds and improves guarantees on spiking efficacy. Al- 
though the analysis does not apply directly to continuous-time models, experiments show that a 
tweak inspired by our analysis improves the performance of a more realistic model. An impor- 
tant avenue for future research is investigating the role of feedback in cortex, specifically NMDA 
synapses, which may have interesting learning-theoretic implications. 



Acknowledgements. We thank Timothee Masquelier for generously sharing his source code 1 32 1 
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Appendices 



A.l The perceptron 

We describe the perceptron to facilitate comparison with the selectron. 

The perceptron's loss function and learning rule are most naturally expressed when inputs and out- 
puts take values in {±1}. We therefore present the perceptron in both ±1 and 0/1 "coordinate 
systems". 

Let us first relabel inputs and outputs from 0/1 to ±1: A = 2X — 1 and B = 2Y — 1, Given input 
a, the perceptron's output is determined according to 

b = /0 w (a) := sign(w T a) , 

where w is a real-valued TV -vector specifying the perceptron's synaptic weights. 

Given supervisor a : A — > {±1} that labels inputs as belonging to one of two classes, define 
0/1-valued loss function for the perceptron as 



Z p (a, p-w,cr) :— l- Pw . (T (a) 
The following learning rule 



if p w (a) = a (a) 

1 else. 



'aj if a (a) = 1, p w (a) = -1 
Awj = a ■ a.j ■ ( <j(a) - p w (a) ) = a ■ { -a 3 if <r(a) = — 1, p w (a) = 1 



, else 

converges onto an optimal solution if the classes are linearly separable. 

The perceptron in "0/ 1 coordinates". For the sake of comparison, we reformulate the perceptron 
in "0/1 coordinates". If Wj > for all j, the mechanism of the perceptron is 

V = /w(x) := H (Vx - ^) . (A. 14) 



Similarly, we obtain loss function 



ip(x, /w,c) 1-(2/ w (x)-1)-ct(x) — | j 



if x = 1. cr(x) = 1 or x = 0, tr(x) 
else 



and learning rule 



,■ if o-(a) = l,Pw(a) = -1 
Aw, = a ■ (2xj - 1) • (<t(x) - 2/ w (x) + 1) = a - { -ay if cr(a) = -l,p w (a) = 1 (A.15) 

,0 else. 

Non-biological features of the perceptron. We highlight two features of the perceptron. First, 
learning rule ( |A.15| > is not selective. If the perceptron classifies an input incorrectly, it updates its 
synaptic weights regardless of whether it outputted a or a 1 . It is thus not an output spike-dependent 
learning rule. The main consequence of this is that the perceptron is forced to classify every input. 
The selectron, by contrast, actively ignores neuromodulatory signals when it does not spike. 

Second, the perceptron requires a local error signal. Multilayer perceptrons are constructed by 
replacing the sign(») in ( |A.14| i with a differentiable function, such as the sigmoid, and backprop- 
agating errors. However, backpropagation requires two pathways: one for feedforward spikes and 
another for feedback errors. In other words, the perceptron requires local error signals. A dedi- 
cated error pathway is biologically implausible [33) , suggesting that the cortex relies on alternate 
mechanisms. 
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A.2 Proof of Theorem [T] 



The theorem requires computing the second moment of a selectron's total current, given i.i.d. 
Bernoulli inputs. We also compute the expectation and the variance since these are of intrinsic 
interest. 

Lemma 5 (moments for i.i.d. Bernoulli inputs). 

For Bernoulli i.i.d. inputs on synapses, i.e. P(jx.j = 0) = pfor all j, we have 

E[<w,x>-0] =p- ||w||i-0 
V[<w,x)-tf]=p(l-p).||w||l 

E[(w,x) 2 ]=Ml-p)-||w||2+p 2 .||w||?. 



Proof. For the mean, 

E 



*>! = £ 



xex 



p( Xl )...p(x„). » *. 



3=1 3=1 



since Wj > for all j. 
For the variance, 



V 



x)j =^P(x)(w,x) 2 -p 2 .||w|| 2 



xex 



E'' 2 ' w vv +E P ' w : 

= p(l-p)-\\w\\l 
The expression for E[(w, x) 2 ] follows immediately. 



3 &3 



□ 



Theorem [TJ Assuming the inputs on each synapse are i.i.d. Bernoulli variables with P(spike) = p, 
we have 

p(.Ux) = i)< P .(W 2 

Proof. By Lemmap] E(w,x) 2 = p(l — p) ■ ||w||| + p 2 ■ ||w|| 2 . Applying Chebyshev's equality, 

- - EX^ - ■ 



P X >e < 



obtains 



P(/w(x) = lj =P((W,X> >0) < 

The result follows since 1 1 w| 1 2 < || w l|i- 



P(l-p)-||w|||+p 2 -||w|| 2 
i9 2 



□ 



A.3 Proof of Theorem |2] 

We first compute the limit for STDP, and then move on to leaky integrate-and-fire neurons. 

Lemma 6 (fast time constant limit of STDP). 
The fast time constant limit of STDP, recall Q, is 

lim Awjfc = (aq_ — a_) • 6o(At) 

T,— >o 



where Sq is the Dirac delta. 
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Proof. We start from the STDP curve: 

tj — tk 



Aw,* := — exp 



exp 



T-4 

-At 



ff(t/- - tj) exp 



/At 

H(At) exp — 



H(-At) 



H(tj - t k ) 



where H is the Heaviside function. Note that we divide by t± to ensure that the area is constant as 
t± —> 0. An equivalent approach is to rescale a and r proportionately, so that both tend to zero. 

Let us now compute the limit, in the sense of distributions, as t± — > 0. Let / be a test function in 
the Schwartz space. The linear form associated to 

„ / a \ a + ( —At" 
5+ (At) = — exp 

is 



ff(At) 



S+ : f(x) i ^ / S+(x)f(x)dx, 



Thus, 



S + (f) 



exp 



H(x)f(x)dx. 



Integrating by parts obtains 



S+(f) 



-a + exp 



—x 



-a + exp 



— x 



f'(x)dx 



so that 

=«+•/(()) +7+ •$+(/') 

and lim r+ _j.o [<£+(/)] — a + ' /(0) — a + ' ^o(/)- A similar argument holds for the negative part of 
the STDP curve, so the fast time constant limit is 

Aw jt = (a+ - a_) ■ ^o(At). 

□ 

Remark 5. If STDP incorporates a potentiation bias ( in the sense that the net integral is positive 
0V), then the fast time constant limit acts exclusively by potentiation. 



Theorem |2j In the fast time constant limit, lim T> — > 0, the SRM transforms into a selectron with 

U(t)=H(M w (t)-i) where M w = ^ w jk -S tk {t). 

{j\tj>t k } 

Moreover, STDP transforms into learning rule Q in the unsupervised setting with v{~x) — lfor all 
x. Finally, STDP arises as gradient ascent on a reward function whose limit is the unsupervised 
setting of reward function (pj. 



Proof. Setting K x = K 2 = 0, r s 



K 



e 2 , and taking the limit r m — > yields 



M w (t)= ^ w ife -5 tfc (t). 
By Lemma|6] taking limits r± — > transforms STDP into 



Awj-fc = 



(a + — a_) if tfe 
else, 



which is a special case of the online learning rule for the selectron derived in f|2] where the neuro- 
modulatory response is ^ (x) = 1 for all x. 

STDP is easily seen to arise as gradient ascent on 



arg max N 

w 4 



E 

t k -i<tj<t k 



E 



Taking the limit of ( |A. 16[ > yields the special case of |2]) where i/(x) = 1 for all x. 



(A.16) 

□ 
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Eq. ( |A.16| l can be expressed in the shorthand 



arg max 



(w t • d(t fc )) • u(t k ) 



where dj(t) = 



eV T + J if tj < t 



else 



A.4 Proof of Theorem g] 

To prove the theorem, we first recall an error bound from [24|. To state their result, we need the 
following notation. Let 



N 



Cu = \ /(x) = sign ( a j ■ ■ 9 3 (x) 



&j e M, ||a||i < u> and e C 



where C is a class of base classifiers taking values in ±1. 

Let (j> : R — » R+ be a nonnegative cost function such that l x >o < </>(a;), for example <f>(x) = 
(1 — sc)+. Define 

1 " 

L(/)=E1_ /W yr <0 and L„(/) = - ^ l_ /(Xi)Yi<0 (A.17) 

i=i 

and 

1 ™ 

A(f)=Ecf>(-f(X)Y) and A n (/) = - y>(-/(JQ)l-)- (A.18) 



n 

i=l 



Theorem 7. Lef / foe a function chosen from based on data (Xi, Yi)f_ v With probability at 
least 1 — (5, 

L(f) < A(f n ) + 2L • E1lad n (C(X? \f = 1)) + B\j^± : (A.19) 

where is the Lipschitz constant of cj>, B is a uniform upper bound on 4>{—f{x)y), and 
lZad(C(Xi)) is the Rademacher complexity ofC on data X". 



Proof See §4.1 of (24). □ 

We are interested in errors of commission, not omission, so we consider modified versions of ( |A. 17| >, 

1 ™ 

£(/) = E [l_ /(x)y<0 • l />0 ] and £„(/) = - ^ [t- f (x)Y<o • l/>o] , (A.20) 

i=l 

and ( |A~T8| ), 

1 ™ 

^l(/)=E[0(-/(X)y)-l />o ] and A.(/) = -X)^(-/(^)^)-l/>o]- (A.21) 



n 

i=l 



where we multiply by the indicator function 1 y >0 . This results in a discontinuous hinge function. 
We therefore have to modify the above theorem. 

Theorem 8. Let f be a function chosen from based on data (Xi, Yi)f_ v With probability at 
least 1 — 5, 



C(f) < A n (.f) + 2(E[1 f] ■ L$ + B) ■ EKad n (C(X?\f = 1)) + 2B\ 



'21ogf 



where is the Lipschitz constant of (f>, B is a uniform upper bound on cf>(—f(x)y), and 
lZad(C(Xi)^) is the Rademacher complexity ofC on data X". 
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Proof. We adapt the proof in [24]. By definition it follows that 

C{f)<A(f) 

= A n (f) + A(f) - A n (f) 

Now observe that J2 X \p{x)f{x)l s ] = E KG s P( x )f( x ) = p(S) Y. x l( x )f( x )> where l( x ) 
p(x\x € A). Thus, changing distributions from P(x) to Q{x) — P{x\f = 1), we can write 

<f>(-f(xi)Vi) 



C(f) < A n (f) 



E 



P(f = 1) ■ E Q &(-f(X)Y)] - PM = 1) • 

{i:f(x i = l} 

(/)• [Epf-E p J 



\{i ■■ ffa = 1}| 



A n (f)+P(f = l)- A Q (f)-A Q M) 



A, 



_ \{i:f(x l = l}\ 



where P n {xi) = - is the empirical distribution, P n (f = 1) 

K<:/Casi=l>| * 

Continuing, 

C(f) < A n (f) + Ep[1/] • sup \Aq{ 9 ) - Aq (g)] + Aq (/) • sup 

gee L v " J v ™ 3 ec 

< + 2E[1/] ■ ^ ■ Ellad n (C(X?\f = 1)) 



and Q n (xi 



E P g - Ep^g 



+ 2A Qn ■ ETZad n (C(X[ l \f = 1)) + 2B\ 



'21og§ 



< + 2(E[l / ] -Lj + B)- EKad n {C(X?\f = 1)) + 2B\ 



'21ogf 



where we bound the two supre mum's with high probability separately using Rademacher complexity 
and apply the union bound. The last inequality follows since < B. □ 

Remark 6. The change of distribution is not important, since the Rademacher complexity is 
bounded by the distribution free VC-dimension in Lemma^ 

To recover the setting of Theorem[3] we specialize to 4> K (f) = (k — f(X)Y) + , for k > 1, so that 
the hinge loss can be written as h K — <fi K (w T x — i?J • 1 /(x)>o- The Lipschitz constant is = 1. 

Now let 



N 



w|| i < u> and j 3 e J 



where functions in J 7 take values in 0/1. 

Function class denotes functions implementable by a two-layer network of selections S k — 
{n k } U {ni : n- 7 — > n k }. The outputs of g 1 are aggregated, so that the function class F u of 
subnetwork S k is larger than that of selectrons considered individually, i.e. T . 

Lemma 9. The Rademacher complexity ofJ- u is upper bounded by 

rr-f i /t- \ / y/2(iV+l)log(n + l) + | 
1Zad n {Tu) < u ■ -j= 



Proof. Given selectron / w (x) = w j ' <?4( x )' l et it* :== ^g- 7 — 1 be the corresponding {ill- 
valued function. Then 

N N N 

£ (x) - d = ~fl Wj & ( X ) + 1) - o = * nix - * + i Wi r (x), 

3=1 2=1 2=1 

since w ; > for all i. Thus, 



Tlad n {F u ) < 



11^-^1 



w • Had n (F). 
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The Rademacher complexity of T is upper bounded by the VC-dimension, 

^•logfa + l) 



which for a selectron with N synapses is at most N + 1. 

Finally, note that if d < then the selectron always spikes, and if $ > ||w||i then it never spikes. 
We therefore assume that < § < ui, which implies |2i9 — ui\ < lu and so 

\h(jJ — 1?| UJ 
l_2 < 



□ 

Theorem [3j Suppose each selectron has < N synapses. For any selectron n k , let S k = {n k } U 
{n 3 : n? — > n k } denote a 2-layer feedforward subnetwork. For all k > 1, vv/f/; probability at least 
1-5, 



E [|(x, / w , i/)] <- ^ /i K (x« , / w , „( X W)) + w • 2B • 



V8(iV+l)log(n + l) + l 



0/1 fora fti/ige loss 



capacity term 



/21og| 



confidence term 



Proof. Applying Theorem]!] Lemma|9] and noting that E[l /] <1<B = k + uj — $, where B is 
an upper bound on the hinge loss, obtains the result. □ 

A.5 An alternate error bound with hard margins 

For the sake of completeness, we prove Corollary [TT] an alternate to Theorem [3] that bounds a 
symmetric 0/1 loss: 

n if/ w (x) = i,Kx) = -i 

l(2/ w (x)-l)-i/(x)<0 = S 1 if /w(x) = 0, I/(x) = 1 

[o else. 

The loss penalizes the selectron when either (i) it fires when it shouldn't or (ii) it doesn't fire when 
it should. 

We replace the modified hinge loss in Theorem [3] with a hard-margin loss. Following p4| , let 
x n (0 ifx<- 7 

LZ(f) = -J2 1 nx*)Yi<7 and <P&) = \ 1 if ^ > o 



n 

i=l 



1 + x/j else. 



The following corollary of Theorem [7] is shown in p4[ . 

Corollary 10. Let f n be a function chosen from C u . For any 7 > 0, with probability at least 1 — 6, 



where Vq is the VC-dimension ofC. 

We use Corollary [l0]to derive an alternate error bound for the selectron. Introduce hard-margin loss 

J 1 if sign(w T x — i9) = sign(i/(x)) and |w T x — $| > 7 
(wTx-o)*(x)< 7 — [0 else 

The following error bound exhibits a trade-off on the 0/1 loss of a selectron that is controlled by 7. 
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Corollary 11 (error bound with hard margins). 

For any 7 > 0, with probability at least 1 — 8, 



El (2/w (x)-l).,(x)<0 < n Z^ 1 (wTx<0^). K x<0)< 7 + -' 



u V8 (jV + l)log(nTT) + l | / 2 log § 



Proof. Follows from Lemma|9]and Corollary [TO] 



□ 



As 7 increases, the size of the margin required to avoid counting towards a loss increases. However, 
the capacity term is multiplied by =■, and so reduces as the size of 7 increases. 

Thus, the larger the margin, on average, the higher the value we can choose for 7 without incurring 
a penalty, and the better the bound in Corollary [TT| 

A.6 Proof of Theorem g] 



It is helpful to introduce some notation. Given x £ X, let S[ = {x|xj = a}. Let Vjk=n, defined 
by equation 



]T P(x| Xj - = l)(wT X -tf)/ w (x) 



= Y, p (*\*i = l)K*)(w T x- 0)/ w (x), 

xGS? 



quantify the average contribution of neuromodulators when selectrons n 3 and n k both spike. Simi- 
larly, let Vjk=oi quantify the average contribution of neuromodulators when n k spikes and n 3 does 
not, i.e. the sum over x £ Sq. 

If upstream neurons are reward maximizers then spikes by n 3 should predict higher neuromodulatory 
rewards, on average, than not firing. We therefore assume 



(*) 



Theorem |4] Let pj := K[Y 3 ] denote the frequency of spikes from neuron n 3 . Under assumption 
Q, the efficacy ofn 3 's spikes on n k is lower bounded by 



I 5R k wj ■ E[Y 3 Y k ] 



2E 



Y j Y k ■ ((wtyx-t?)] E\Y k ■ ((w\)t x -?9) 



where 



Pj 



Wj ifi ^ j 
else. 



l- Pj 



Proof. 

SR k _ 

<5x," 



E[i? fc | Xj = 1] -E[i? fe | Xj = 0] 
= Y P(*\ x 3 = l)"W(w T x - tf)/ w (x) - P(x\*j = 0)^(x)(wT x - tf)/ w (x) 

- l/ ik=U ■ J2 P ( X I X ^ = X )( WTX - ^)/w(x) - ^fe=01 • 51 P ( X I X J = °)( wTx - *)/w(x). 
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Assumption Q implies 

or ^ E p ( x ^- = x )( wTx - - E p ( x i^- - °)( wTx - *)/wW 

fife = 11 OX, ^— ' 

xes? J xes^ 

= — E \Y j Y k • (w T x - 0)1 1 — E f(l - F*)Y* ■ (w T x - 0)1 

Pj 1-Pj L 

The result follows by direct computation. □ 

A.7 Generation of Poisson spike trains 



Spike trains are generated following p2[ . Neurons have 200 synaptic inputs. Time is discretized 
into 1ms bins. At each time step spikes are generated according to a Poisson process where the rate 
varies as follows: 

1. a synapse has probability r ■ dt of emitting a spike where r is clipped in [0, 90}Hz. 

2. dr — s ■ dt where s is clipped in [—1800, 1800] Hz. 

3. the rate of change ds of s is uniformly picked from [—360, 3QQ]Hz. 

The resulting spike train has an average firing rate of about MHz. Masquelier et al also add a 
mechanism to ensure each synapse transmits a spike after at most 50ms which we do not implement. 

Repeated patterns are sampled from the process above for 50ms. The pattern is then cut-and-paste 
over the original Poisson spike train for | of the total number of 50ms blocks. The cut-and-pasting 
is imposed on a randomly chosen (but fixed) subset containing | of the neuron's synapses. 
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